Systems and methods for risk factor predictive modeling with model explanations

ABSTRACT

A suite of fluidless predictive machine learning models includes a fluidless mortality module, smoking propensity model, and prescription fills model. The fluidless machine learning models are trained against a corpus of historical underwriting applications of a sponsoring enterprise, including clinical data of historical applicants. Fluidless models are trained by application of a random forest ensemble including survival, regression, and classification models. The trained models produce high-resolution, individual mortality scores. A fluidless underwriting protocol runs these predictive models to assess mortality risk and other risk attributes of a fluidless application that excludes clinical data to determine whether to present an accelerated underwriting offer. If any of the fluidless predictive models determines a high risk target, the applicant is required to submit clinical data, and an explanation model generates an explanation file for user interpretability of any high risk model prediction and the adverse underwriting decision.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims benefit of U.S. Provisional App. No.62/848,397, filed May 15, 2019, claims the benefit of U.S. ProvisionalApp. No. 62/899,543, filed Sep. 12, 2019, and claims the benefit of U.S.Provisional App. No. 62/993,584 filed Mar. 23, 2020, all of which areincorporated by reference in their entirety.

TECHNICAL FIELD

The present disclosure relates generally to predictive modeling.

BACKGROUND

With the advent of machine learning techniques, algorithmic systems canbe quite complex. The algorithmic systems do not provide adequatetransparency into factors affecting decisions and scoring. Additionally,from a user's standpoint, algorithmic processing can be opaque, e.g., ablack box process in which the user enters items of information just tosee the computer system produce a result. In determining whether totrain a model, revise an algorithm, or deploy a new model, more insightis needed into the critical factors affecting the output of the model.It can be very difficult to explain decisions resulting from a machinelearning model to affected users in a way that allows users to trust thealgorithmic processing. Understanding the reasons behind modelpredictions is desirable.

SUMMARY

What is needed are systems and methods that provide qualitativeunderstanding and quantitative understanding of underwriting modelpredictions. Another need is for systems and methods that providepersons affected by underwriting decisions with a specific reason orreasons for any adverse underwriting decision. An additional need issystems and methods that provide qualitative understanding andquantitative understanding of algorithmic underwriting models forenterprise users of these models, such as underwriters and developers.An additional need is for a tool to enable model developers to checkalgorithmic underwriting models for inconsistencies and undesirablebehavior.

Embodiments described herein can improve customer experience with fasterprocessing and reduced customer burdens of providing informationrequired by the underwriting process, while providing fair andtransparent algorithmic underwriting. A fluidless underwriting modelsuite for predictive modeling of mortality treats clinical assessmentdata as excluded risk factors while enabling assessment andclassification of risk with respect to applicants for life insuranceaccording to acceptable alternative criteria. The fluidless underwritingprotocol generates and displays explanations of model outputs includingpredictions and underwriting decisions.

In an embodiment, a processor-based method receives from a user device aplurality of variables of an electronic application to an enterprise,wherein the plurality of variables for the electronic applicationexclude clinical data for an applicant of the electronic application.Upon receiving the plurality of variables for the electronic applicationfrom the user device, the processor retrieves public data identifiedwith the applicant from one or more third-party sources. The processorexecutes a first predictive machine learning module configured todetermine a first risk rank representative of a mortality risk for theelectronic application and to classify the electronic application intoone of a first high risk group and a first low risk group based upon thefirst risk rank. In various embodiments, the first predictive machinelearning module comprises a machine learning model utilizing one or moreof survival modeling, regression modeling, and classification modeling.The processor executes a second predictive machine learning moduleconfigured to determine a second risk rank and to classify theelectronic application into one of a second high risk group and a secondlow risk group based upon the second risk rank. The processor executes athird predictive machine learning module configured to determine a thirdrisk rank and to classify the electronic application into one of a thirdhigh risk group and a third low risk group based upon the third riskrank.

In the event the processor classifies the electronic application intoall of the first low risk group, the second low risk group, and thethird low risk group, the processor generates a user interface fordisplay of an accelerated application offer at the user device. In theevent the processor classifies the electronic application into one ormore of the first high risk group, the second high risk group, and thethird high risk group, the processor generates a user interface fordisplay at the user device of an explanation file includinginterpretability data for the predictive machine learning modelprediction of the one or more of the first high risk group, the secondhigh risk group, and the third high risk group.

In an embodiment, the method determines whether to generate the userinterface for display of the accelerated application or to display anunderwriting decision declining the fluidless application andinstructing the applicant to submit clinical data in any resubmittedunderwriting application.

In various embodiments, the processor executes an explanation model togenerate an explanation file for the electronic application. Theexplanation file is a textual and/or visual artifact that provides aquantitative understanding and/or qualitative understanding ofpredictions by the machine learning model.

In various embodiments, the method inputs data associated with theelectronic application into an additive feature attribution module inorder to generate the explanation file of the predictive machinelearning module outputs. Inputted data include features datarepresentative of at least some of the variables of the electronicapplication, model object data representative of the machine learningmodel, and model prediction data representative of predictive machinelearning module outputs. In an embodiment, the method generates a reportfor the electronic application for display by a user interface, thereport including the quantitative score, underwriting decision dataderived from the underwriting decision file, and explanation dataderived from the explanation file for the electronic application.

In an embodiment, the additive feature attribution module executes aSHAP values (SHapley Additive exPlanation) algorithm. In an embodiment,the additive feature attribution module executes a Kernel SHAPalgorithm. In an embodiment, the additive feature attribution moduleexecutes a Tree SHAP algorithm.

In an embodiment, each of a plurality of historical application recordsincludes clinical assessment data for an applicant of the respectivehistorical application record. Prior to inputting the historicalapplication records into the machine learning model ensemble, the methodsupplements each historical application record with public dataidentified with the applicant of the respective historical applicationrecord. In an embodiment, the public data comprises public records andcredit risk data. In various embodiments, the first predictive machinelearning model is continuously trained using updated public records andcredit data.

In various embodiments, the second risk rank is representative of apropensity of the applicant of the electronic application to be asmoker. In an embodiment, the second predictive machine learning moduleis a random forest classification model configured to estimate thepropensity of the applicant of the electronic application to be asmoker.

In various embodiments, the third risk rank is representative ofprescription drug data for the applicant of the electronic application.In an embodiment, the third predictive machine learning module isconfigured to determine disqualifying medical risks based on informationderived from prescription drug fills for the applicant of the electronicapplication.

In an embodiment, the first risk rank comprises a quantitative scorerepresentative of the mortality risk for the electronic application. Invarious embodiments, one or more of the first risk rank, the second riskrank, and the third risk rank comprises a percentile within a scoredistribution for a population of customers of the enterprise.

In an embodiment, a method for processing an electronic application,comprises receiving, by a processor, a plurality of variables of anelectronic application from a user device, wherein the plurality ofvariables for the electronic application exclude clinical data for anapplicant; upon receiving the plurality of variables for the electronicapplication from the user device, retrieving, by the processor, publicdata identified with the applicant of the electronic application fromone or more third-party sources; executing, by the processor, a firstpredictive machine learning module to determine a first risk rankrepresentative of a mortality risk for the electronic application and toclassify the electronic application into one of a first high risk groupand a first low risk group based upon the first risk rank; executing, bythe processor, a second predictive machine learning module to determinea second risk rank and to classify the electronic application into oneof a second high risk group and a second low risk group based upon thesecond risk rank; executing, by the processor, a third predictivemachine learning module to determine a third risk rank and to classifythe electronic application into one of a third high risk group and athird low risk group based upon the third risk rank; and when theprocessor classifies the electronic application into one or more of thefirst high risk group, the second high risk group, and the third highrisk group, generating, by the processor, an explanation file fordisplay on a user interface, the explanation file includinginterpretability data based on the determination of the one or more ofthe first high risk group, the second high risk group, and the thirdhigh risk group.

In another embodiment, a system comprises an analytical enginecontaining a processor configured to execute a plurality ofnon-transitory computer-readable instructions configured to receive aplurality of variables for an electronic application from a user devicethat excludes clinical data for an applicant of the electronicapplication, and for retrieving public data identified with theapplicant of the received electronic application from one or morethird-party sources; execute a predictive machine learning module todetermine a mortality risk rank for the electronic application andclassify the electronic application into a first low risk group or afirst high risk group; execute a smoking propensity predictive model;wherein the smoking propensity model is configured to estimate apropensity of the applicant of the electronic application to be a smokerand determine a smoking/non-smoking binary target; execute aprescription drug data predictive model configured to determine adisqualifying medical risk based on information derived fromprescription drug fills for the applicant of the electronic application;and generate and presenting the user interface that displays informationassociated with an accelerated application offer when the analyticalengine server classifies the electronic application into the first lowrisk group, determines the non-smoking binary target, and does notdetermine the disqualifying medical risk; and that displays anexplanation file including interpretability data when the analyticalengine server effects one or more of the following: classifies theelectronic application into the first high risk group, determines thesmoking binary target, determines the disqualifying medical risk.

Other objects, features, and advantages of the present disclosure willbecome apparent with reference to the drawings and detailed descriptionof the illustrative embodiments that follow.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting embodiments of the present disclosure are described by wayof example with reference to the accompanying figures which areschematic and are not intended to be drawn to scale. Unless indicated asrepresenting the background art, the figures represent aspects of thedisclosure.

FIG. 1 is a system architecture for a machine learning underwritingsystem including an explanation model, according to an embodiment.

FIG. 2 is a schematic diagram of databases, models, and model outputs ofa machine learning underwriting system with underwriting model suite andexplanations, according to an embodiment.

FIG. 3 illustrates a method for appending public records and credit riskdata to historical application records, according to an embodiment.

FIG. 4 is a schematic diagram of input data sources and models of afluidless underwriting method, according to an embodiment.

FIG. 5 is a flow chart diagram of data preprocessing procedures of apredictive machine learning module, according to an embodiment.

FIG. 6A displays a relationship between an individual's number ofcollections and input address length of residence, according to anembodiment.

FIG. 6B displays a relationship between an individual's number ofcollections and input address length of residence, according to anembodiment.

FIG. 6C displays a relationship between an individual's total number ofaccounts and smoking incidence, according to an embodiment.

FIG. 6D displays a relationship between an individual's number ofderogatory accounts and smoking incidence, according to an embodiment.

FIG. 7 is a flow chart diagram of a fluidless underwriting protocol,according to an embodiment.

FIG. 8 displays an explanation of a machine learning underwriting modelprediction including additive contributions to a risk score, accordingto an embodiment.

FIG. 9 shows a screen shot of an interactive dashboard of an explanationmodel for a machine learning underwriting system, according to anembodiment.

DETAILED DESCRIPTION

In the following detailed description, reference is made to theaccompanying drawings, which depict non-limiting, illustrativeembodiments of the present disclosure. Other embodiments may be utilizedand logical variations, e.g., structural and/or mechanical, may beimplemented without departing from the scope of the present disclosure.To avoid unnecessary detail, certain information, items, or detailsknown to those skilled in the art may be omitted from the following.

Underwriting is the process an insurance company uses to determinewhether or not a potential customer is eligible for insurance, and whatrate that potential customer should pay for the insurance if eligible.Insurance underwriting seeks to spread risk among a pool of insured in amanner that is both fair to the customer and profitable for the insurer.One consideration is that it does not make sense for insurers to selllife insurance, for example, to everyone who applies for it.Additionally, although insurance companies do not intend to charge theircustomers excessively high rates, it is not prudent for them to chargeall their policyholders the same premium. Underwriting enables thecompany to decline coverage to certain applicants, and to charge theremaining applicants premiums and to provide other policy terms that arecommensurate with their level of risk.

Traditionally, underwriting has been a manual process. Underwriting caninvolve numerous people including agents and doctors, and it can be verytime-consuming. Therefore, various entities have developed systems andmethods to automate the underwriting process in order to improvedecision-making, reduce the number of people involved, and acceleratethe underwriting process. These systems and methods may be referred toas algorithmic underwriting. With the advent of machine learningtechniques, algorithmic underwriting systems can be quite complex. Itcan be challenging to describe the methodology that resulted inunderwriting decisions, such as a decision to decline coverage, adecision to charge certain premiums, or a decision to impose a policylimitation. Therefore, it can be very difficult to explain underwritingdecisions to affected customers in a way that these customers can trustthe process. Understanding the reasons behind model predictions is quiteimportant in assessing trust. Trust in individual model predictions isfundamental if one is affected by the model predictions.

Traditionally, most types of life insurance require an estimate of theexpected lifetime of an individual at the time of application, commonlycalled the “mortality risk.” Conventional protocols for collecting andanalyzing data that describes mortality risk are known as underwriting.Actuaries compute the cost of covering mortality risk over the lifetimeof the policy and translate it into a set of premium payments requiredthroughout a policy's term. Life insurance risk assessment has primarilyconsisted of point systems developed by medical doctors and experiencedunderwriters. Such protocols commonly calculate risk by mapping medicaland behavioral attributes to point values that either debit or credit anoverall score. A life insurance underwriter reviews an application tocalculate the net number of points and to determine one of several riskclasses that determine pricing according to aggregate mortality.

The system and method of the present disclosure represent anunderwriting protocol that improves customer experience with fasterprocessing and reduced customer burdens of providing informationrequired by the underwriting process. The underwriting protocoleliminates the traditional requirement in underwriting of life insuranceproducts of collection of body fluids and various physical measurements,and analysis of risk factors based on these inputs. In the presentdisclosure, the underwriting protocol and the application received fromthe user are sometimes called “fluidless” (e.g., fluidless underwriting,fluidless application).

As used in the present disclosure, a “risk factor” is any variableassociated with a health outcome or state, such as a risk of disease,infection and/or health-related event, e.g., a stroke, diabetes, heartattack, cancer and death. Risk factors may be correlated with a healthoutcome or state and may have a causal relationship with the healthoutcome or state. In the present disclosure, the omitted or excludedrisk factors are risk factors derived from collection and analysis ofbody fluids and various biophysical measurements. In the presentdisclosure, these are sometimes called “clinical assessment riskfactors” or alternatively “clinical assessments,” and the collectedmedical data for these clinical assessments (e.g., body fluids andbiophysical measurements) are sometimes called “clinical data,” oralternatively “clinical laboratory data.”

In lieu of clinical assessments as inputs to mortality predictions, thesystem and method of the present disclosure employ a mortalitypredictive model trained using data from a large corpus of historicalapplications based on traditional underwriting protocols, in conjunctionwith public data sources that can provide a thorough view of prospectivecustomers. The system and methods of the present disclosure receive asinput applications of prospective customers that exclude clinical data,and apply fluidless mortality predictive modeling to determine whetherto approve sale to the applicant of a risk-pooling product, such as lifeinsurance. If the fluidless underwriting protocol does not result inapproval of the fluidless application, the applicant can submit clinicaldata in an application to be underwritten inclusive of those riskfactors.

Clinical data collected in medical examinations in support ofconventional applications for life insurance are typically employed toassess the applicant's health, to confirm information included in theapplication, and to screen for illegal drug use. Much of the collectedclinical data is also obtained from other sources during the applicationprocess, and clinical test results and answers to examination questionsare typically checked for consistency with the other sources.

Clinical laboratory data are a point-in-time view into an individual'shealth. Underwriting ties various clinical data to all-cause mortalitypredictions and to specific causes of mortality. Clinical assessmentsbased on collected blood and urine samples typically test the collectedfluids to screen for dozens of indicators of diseases and conditions(health indicators). Examples of clinical assessment risk factorsinclude HIV and AIDS; sexually transmitted diseases (STD); cholesterol,(including LDL and HDL) and triglycerides (e.g., as indicators of heartdisease risk factors); hemoglobin A1C, fructosamine and glucose levels(e.g., as indicators of diabetes); creatinine, hemoglobin and proteins(e.g., as indicators of kidney disease); and urine acidity (e.g., asindicator of impaired kidney function or diabetes). Typical medicalexaminations also screen for nicotine and cotinine in the urinalysis inorder to determine tobacco usage. Additionally, clinical assessments mayinclude biophysical examinations such as weighing the applicant andquestioning the applicant, e.g., about lifestyle.

While excluding such clinical assessments eliminates informativeindicators of risk factors that can yield substantial protective valuein risk selection, the fluidless underwriting protocols of the presentdisclosure identify low-risk applicants for whom traditional clinicalassessments can be waived with little to no impact on mortality risk. Inlieu of clinical laboratory data, fluidless underwriting protocolsdisclosed herein utilize nontraditional data sources—public records andcredit risk—that yield information on insurance applicants' behaviorproviding significant insights into mortality risk. In addition to thosenontraditional data sources, fluidless underwriting protocols disclosedherein can include a client medical interview in the applicationprocess.

A fluidless underwriting protocol was validated by simulating a set ofapplications approved by the protocol, also herein called a “book ofbusiness.” The simulation compared the book of business with historicalunderwriting risk class offers that effectively control all primaryactuarial factors. This simulation showed that fluidless underwritingprotocols incorporating the excluded risk-factor predictive modelingsystems and methods of the present disclosure generate substantiallyimproved offer rates based on accelerated underwriting withoutcompromising mortality margins of conventional underwriting protocols.

The system and method of the present disclosure apply machine learningmethods to underwriting protocols. Complex machine learning methods havebecome common in decision making systems in industry and government.When these methods are deployed in settings that affect human datasubjects, an increasing concern of users and regulators is that they mayhave difficulty in oversight of mathematical tools. This has resulted incalls for greater transparency. In Europe, the General Data ProtectionRegulation gives individuals the right to “meaningful information aboutthe logic involved” in algorithmic decision-making. In applyingalgorithmic decision-making methods in the life insurance industry,various regulatory authorities in the United States have requiredinsurance companies to provide insured or potential insured withspecific reasons for adverse underwriting decisions.

Underwriting protocols provide users such as underwriters, developers,and consumers with explanations for algorithmic underwriting outcomes.As used in the present disclosure, “explanations” are textual and visualartifacts that provide understanding of model predictions. Anexplanation can provide an understanding of an individual prediction ofa machine learning model. The explanation can provide an understandingof the machine learning model itself. A machine learning model canprovide quantitative predictions, and the explanation can provides oneor both of qualitative understanding and quantitative understanding ofthe model's quantitative predictions. Explanations of machine learningmodels are commonly associated with “interpretability.” As used herein,“interpretability” denotes the degree to which a human can understandthe cause of a decision by a machine learning model. Properties of aninterpretable model can include that a human can repeat the model'scomputation process with a full understanding of the algorithm.

As used in the present disclosure, an “explanation model,” also hereincalled an “interpretability model,” treats explanations of predictionsof a machine learning model as a model itself. In various embodiments,an explanation model implements additive feature attribution forinterpretability of underwriting model outcomes. Additive featureattribution provides various desirable properties in an explanationmethod. In various embodiments, an explanation model using additivefeature explanation provides users with real-time explanations ofunderwriting model predictions. In various embodiments, an explanationmodel includes a transparency tool that enables users to easilyvisualize signals picked up by a machine learning underwriting model.Model developers can use the transparency tool in order to check amachine learning underwriting model for inconsistencies and undesirablebehavior.

Disclosed embodiments of fluidless underwriting method apply a suite ofpredictive models to model inputs for a fluidless application in orderto determine whether to present an accelerated underwriting offer to theapplicant. In order to receive approval for presentation of anaccelerated underwriting offer, the application must pass all modelcomponents of the fluidless model suite. In the event the application isdeclined, an explanation model generates one or both of a holisticexplanation and a modular explanation for display to the applicant. Asused in the present application, a “holistic” explanation is aimed atinterpretability of the fluidless predictive model suite as a whole,including understanding the decision whether to present an acceleratedunderwriting offer to the applicant. As used in the present application,a “modular” explanation is aimed at interpretability of a prediction bya particular component model of the fluidless predictive model suite,e.g., in the event the application is declined because the applicationfailed to pass that component model.

Disclosed embodiments of a fluidless underwriting method apply on demanda fluidless mortality predictive model that has been trained against alarge corpus of historical underwriting applications including clinicalassessment data. During model training the method executes a predictivemachine learning model configured to determine a mortality score foreach historical application record of a plurality of historicalapplication records stored in a historical application database. Themethod effects feature transformations on various attributes ofhistorical application records to construct engineered features withimproved impacts on predicted value. Additionally, the predictivemachine learning model effects a missingness procedure that providesimputed values for missing values in the historical application data.The predictive machine learning model is configured to determine the setof mortality scores by inputting engineered features and the customerprofile data into a suite of predictive models based on survival,regression, and classification tasks. In an example, this suite ofmodels uses the random forest algorithm.

FIG. 1 shows a system architecture for a fluidless application reviewsystem 100, also herein called a “fluidless underwriting system,” of asponsoring enterprise. The fluidless underwriting system 100 may behosted on one or more computers (or servers), and the one or morecomputers may include or be communicatively coupled to one or moredatabases. The application review system 100 manages predictive modelingof mortality risk factors that exclude clinical assessment risk factorsfor applicants for life insurance or other financial products of thesponsoring enterprise.

A sponsoring enterprise for interpretable underwriting system 100 can bean insurance company or other financial services company, which may berepresented by insurance agents or advisors. In some cases, an insuranceagent may be associated with only a single insurance provider (sometimesreferred to as a “captive” insurance agent). In other cases, an“independent” insurance agent, sometimes called an “insurance broker,”may be associated with several different insurance providers. A user(customer or customer representative) can submit a digital applicationvia user device 180, and the digital application received byinterpretable underwriting system 100 can be assigned to an agent oradvisor.

Fluidless underwriting analytical module 110 includes an analyticalengine 114 and an algorithmic rules engine submodule 118. Examplealgorithmic rules engine submodule 118 executed thousands of automatedrules encompassing health, behavioral, and financial attributescollected through digital fluidless applications 222 and throughreal-time vendor APIs 190. As used herein, a module may representfunctionality (or at least a part of the functionality) performed by aserver and/or a processor. For instance, different modules may representdifferent portion of the code executed by the analytical engine serverto achieve the results described herein. Therefore, a single server mayperform the functionality described as being performed by separatemodules.

Analytical engine 114 can be executed by a server, one or more servercomputers, authorized client computing devices, smartphones, desktopcomputers, laptop computers, tablet computers, PDAs and other types ofprocessor-controlled devices that receive, process, and/or transmitdigital data. Analytical engine 114 can be implemented using asingle-processor system including one processor, or a multi-processorsystem including any number of suitable processors that may be employedto provide for parallel and/or sequential execution of one or moreportions of the techniques described herein. Analytical engine 114performs these operations as a result of central processing unitexecuting software instructions contained within a computer-readablemedium, such as within memory. In one embodiment, the softwareinstructions of the system are read into memory associated with theanalytical engine 114 from another memory location, such as from astorage device, or from another computing device via communicationinterface. In this embodiment, the software instructions containedwithin memory instruct the analytical engine 114 to perform processesdescribed below. Alternatively, hardwired circuitry may be used in placeof, or in combination with, software instructions to implement theprocesses described herein. Thus, implementations described herein arenot limited to any specific combinations of hardware circuitry andsoftware.

In various embodiments, underwriting models 154 apply machine learningpredictive modeling to enterprise data 140 and to third-party data 190to derive model outcomes 164. In various embodiments, model outcomes 164include risk scoring information and underwriting decisions for userswho have submitted a fluidless application for insurance. In variousembodiments, underwriting decisions include an application decision forthe fluidless application. In various embodiments, underwritingdecisions include a decision to present an application offer file forthe fluidless application. In various embodiments, underwritingdecisions include a decision to decline to present an application offerfile for the fluidless application.

In various embodiments, model outcomes 164 include risk scoringinformation, also herein called “risk ranks.” Risk ranks may include,for example, quantitative risk scores, percentiles, binary riskoutcomes, and risk classes. In an example, risk ranks include the user'spercentile within the score distribution for a population of generalusers, together with the score of the particular user. Risk scoring canbe a binary outcome, such as “pass” or “fail.” Risk scores can defineone or more bins as percentile ranges in a percentile distribution for apopulation of general users. Risk scoring can rank cases by thelikelihood of belonging to one risk class or the other. Risk scoring candetermine a quantitative risk score, such as net number of points, forthe user and translates this risk score into one of severalcoarse-grained risk classes. Risk ranks can include a risk class towhich the user has been assigned. For example, the user may be assignedto UPNT for non-smokers or SPT for self-reported smokers. In some cases,a sponsoring enterprise may deny coverage for an applicant with a riskrank representing a very high medical or financial risk.

In parallel with predictive modeling of underwriting, an explanationmodel 158, also herein called “explanation method” or “interpretabilitymodel,” generates explanations 168, also herein called explanationfiles, of model outcomes. Explanation model 158 can generate anexplanation file for display on user device 180. An explanation filedisplayed by user device can include interpretability data based onpredictive machine learning modeling by underwriting models 154.Additionally, the displayed explanation file can include risk scoringinformation and underwriting decisions 164.

In the underwriting system embodiment of FIG. 2 , enterprise databases220 consist of various databases under custody of a sponsoringenterprise, including fluidless applications database 222, historicalapplications database 224, and customer database 226. Enterprisedatabases 220 are organized collections of data, stored innon-transitory machine-readable storage. The databases may execute ormay be managed by database management systems (DBMS), which may becomputer software applications that interact with users, otherapplications, and the database itself, to capture (e.g., store data,update data) and analyze data (e.g., query data, execute data analysisalgorithms). In some cases, the DBMS may execute or facilitate thedefinition, creation, querying, updating, and/or administration ofdatabases. The databases may conform to a well-known structuralrepresentational model, such as relational databases, object-orienteddatabases, and network databases. Example database management systemsinclude MySQL, PostgreSQL, SQLite, Microsoft SQL Server, MicrosoftAccess, Oracle, SAP, dBASE, FoxPro, IBM DB2, LibreOffice Base, FileMakerPro. Example database management systems also include NoSQL databases,i.e., non-relational or distributed databases that encompass variouscategories: key-value stores, document databases, wide-column databases,and graph databases.

A suite of fluidless models 240 includes a Fluidless Mortality Model242, Smoking Propensity Model 244, and Prescription Fills Model 246. Inan example, fluidless models 240 were trained against a large corpus ofhistorical underwriting applications 226 of a sponsoring enterprise.With further reference to FIG. 1 , data acquisition and transformationsmodule 124 applied a data append procedure and data transformationprocedures to the historical application data to yield an extensive dataset with engineered features having improved predictive values.Fluidless models 240 were then trained by application of models withinrandom forest survival models ensemble 130. Model training curated adata set on the scale of one million historical applications, whereinthe historical applications included then-current clinical assessmentdata of the applicants. The trained models produced high-resolution,individual mortality scores.

In an example, historical underwriting applications 276 included dataobtained from an extended time period. This presented the challenge inmodeling of taking into account temporal factors, such as a decreasingtrend of certain lab values over the time period of the historicalapplications. Example modeling techniques of the disclosure applied astatistical adjustment to account for covariate shift ornon-stationarity, i.e., differences in distribution of certainpredictive variables over the relevant time period.

In parallel with predictive modeling of underwriting, explanationmethods 250, also herein called “explanation models” or“interpretability models,” generate explanations 270 of model outcomes.Example methods 250 include holistic method 254, which generatesholistic level explanations 274 of model outcomes, and modular method258, which generates modular level explanations 278 of model outcomes.

Explanation methods incorporate additive feature attribution. Additivefeature attributions describes the sensitivity of underwriting machinelearning models 240 to different values in a rigorous manner. An exampleadditive feature attributions module employs a SHAP values (SHapleyAdditive exPlanation) algorithm. The additive feature attribution modulecan execute a Kernel SHAP algorithm. The additive feature attributionmodule can execute a Tree SHAP algorithm. Additive feature attributionscan generate explanations of various outputs of random forest survivalmodels 240, including for example explanations of raw output of the treemodel, output of the model transformed into probability space, and modelperformance metrics broken down by feature.

Explanations can include either or both text data and graphical datadetermined by explanation module 250. Example explanation models 250present additive feature attribution outputs in graphical formats, e.g.,as shown in FIGS. 8, 9 .

In various embodiments, a rule-based natural language component ofexplanation methods 250 translates model outcomes 260 into standardizedtext explanations. Text explanations can describe risk scoring outcomes264 and underwriting decisions 268. Text explanations can be aimed atinterpretability at a holistic level 274 and at a modular level 278.Text explanations can include qualitative information and quantitativeinformation. Examples of modular level text explanations 278 include “Afluidless mortality model assigns risk class X to every male with a BMIthis high”; “A fluidless mortality model places a large weight on BMI,and your BMI is Y points higher than average”; “If your BMI were Ypoints lower, you would have received risk class Z”; “A fluidlesssmoking propensity model estimated excessive risk of smoking propensity,given that the fluidless application does not require submission of aurine screen and a cotinine test.” Examples of holistic level textexplanations 274 include “Fluidless applicants are offered the best,lowest-priced risk classes they would have received had they beentraditionally underwritten with knowledge of lab results;” “Thefluidless application does not qualify as a low risk application, butthe applicant can resubmit the application with laboratory tests andbiophysical measurements.”

In various embodiments, interpretable underwriting system 100 transmitsreports of model outcomes 164, such as underwriting decisions, to userdevices 180. In various embodiments, interpretable underwriting system100 also transmits explanations 168 of model outcomes to user device180. Users associated with a user device 180 may include customers(e.g., applicants for insurance), as well as enterprise users such asinsurance agents, underwriters, system developers or otherrepresentatives of an insurance company, other financial servicescompany, or insurance broker, among other possibilities. System 100 cangenerate explanations 168 in real time simultaneously orcontemporaneously with generating model outcomes 164. Interpretableunderwriting system 100 can report underwriting model outcomes andexplanations to user device 180 simultaneously or contemporaneously.

FIG. 3 is a schematic diagram of a data acquisition and transformationprocedure used to extract and transform historical applications data forhistorical insurance applicants of the enterprise, stored in EnterpriseDatabases 320. The procedure of FIG. 3 was used to acquire appendedapplications data for the historical applications database 226, and totransform that data via data transformation module 124. Procedure 300acquired historical applications data and supplemented that data withnon-traditional data using techniques that protect privacy rights of theapplicants. Appended applications data included non-traditional dataattributes that supplement traditional underwriting attributespreviously tracked by the enterprise for these historical applicants. Athird-party data vendor of Credit Risk/Public Records databases 310 usedpersonally identifiable information to match data 330 to internalrecords. Subsequently, the third-party database vendor removed thepersonally identifiable information prior to returning the data set tothe sponsoring enterprise at 340 with credit risk and public recordsattributes appended. At the final stage at 350, the data set inEnterprise Databases 320 was decrypted, and had a de-identified state toprotect the privacy of customers.

The fluidless model suite 240 acquires data from four primary datasources in modeling via data acquisition and transformation module 124.FIG. 4 shows a simplified schematic of a system 400 for evaluatingprogram-eligible applicants for approval to receive a fluidlessunderwriting offer. Data sources of system 400 include twonontraditional underwriting sources, public records 434 and credit riskdata 438, and two traditional underwriting sources, client medicalinterviews (“CMI”; application part II 458), and prescription drughistories (Rx data 442). As used in the present disclosure, the genericterm “public data” denotes data relating to applicants of the enterpriseobtained from one or more third-party sources, and encompasses both“public records” and “credit risk data.”

In various embodiments, “public records” include attributes that pertainto individual-level records that are filed by a public office, such asaddresses, education, licenses, property, assets, and financialdisclosures. Example public records attributes include the number oflien records on file, time since the most recent bankruptcy filing,number of evictions, and the tax-assessed value of the individual'scurrent address. Public records data set 434 is acquired via third-partyAPI 190. In preparation for model training this data was acquired viathe data append procedure 300 of FIG. 3 . In production this informationis retrieved in real-time through API calls to a third-party vendor ofPublic Records database 434.

In various embodiments, “credit risk data” include attributes withinformation that pertains to credit behavior, such as types of openaccounts, account balances, derogatory remarks, collections, andbankruptcies. Example credit risk data attributes include the number ofcollections, ratio of amount past due to amount of total balances, andnumber of open auto finance accounts. Credit risk data set 438 isacquired via third-party API 190. In preparation for model training thisdata was acquired via the data append procedure 300 of FIG. 3 . Inproduction this information is retrieved in real-time through API callsto a third-party vendor of Credit Risk database 438.

The CMI data set 458 consists of an extensive questionnaire filled outby life insurance applicants. This digital questionnaire covers personaland family health history and behavioral topics. Behavioral topicsinclude motor vehicle violations, smoking, and other topics pertainingto behavioral risks. During model development, non-digital questionnaireresponses were digitized, and data transformation procedures wereapplied to generate features that were incorporated into the model suite410. In an example, the resulting training data included over 400columns including both Boolean answers and keyword extraction onopen-text fields that align to major medical impairments. In production,digital CMI data 458 and Application Part I data 454 is received viauser inputs at user device 180, transmitted via network 170 and storedin fluidless applications database 222. Alternatively, this data isreceived via paper application and is digitized for storage.

Prescription drug histories data (Rx 442) contains historicalprescription drug fills for applicants. An example Rx 442 contains aseven (7) year history per applicant. Each fill record can include thedrug generic name, brand name, NDC (National Drug Code), priority, dayssupply, quantity, prescription fill date, physician name, registrationnumber, and specialty. In an example, the Rx data set constructed duringmodel building contains data for thousands of applicants, including moremillions of prescription fill records. In production the Rx data iscollected in real-time via API calls to one or more third-party vendorsof online computer databases of medical prescription records.

In the flow chart schematic of FIG. 5 , during model development dataacquisition and transformation module 124 applied various datapre-processing procedures 500 to acquired data. Step 502 excludedvarious certain features based on privacy or regulatory considerationsand other factors.

Feature engineering procedures 504, 506, and 508 combined or otherwisetransformed data attributes in various way to construct engineeredvariables that were potentially more useful in modeling. For example, inthe medical literature, body-mass index (BMI) has been shown to be amore directly causal driver of mortality risk than weight (andespecially height) alone. An example of an engineered variable is BMI asa function of height and weight, which addressed the significantinteraction between height and weight. In various embodiments,engineered variables also were constructed for credit risk and publicrecords attributes.

Data transformation procedures generated various classes of engineeredfeatures: indicators, ratios, and temporal rates. Step 504 constructedcount indicator variables. This procedure addressed variables that aremeasured as a count (e.g., number of felonies) that have a very highproportion of zeros, with a very infrequent but long tail. Featureengineering constructed several indicator variables that reflect anynon-zero count of such events.

Step 506 calculated ratios between countervailing quantitativevariables. Developing ratios between counts of countervailing quantitiescan be useful to compute for statistical efficiency. In an example,liens to properties is a weighted ratio of the number of filed orreleased liens to the number of owned properties (e.g., houses,aircrafts). This ratio was highly predictive in a fluidless mortalitymodel 242 that relied on public records and CMI (application part II458) as modeling inputs.

Step 508 determined temporal rates and temporal extents oftime-dependent attributes. The credit risk and public records attributesoften denote counts of quantities within certain temporal extents. Aspresented, these attributes are overlapping and highly correlated. Thisprocedure develops features that represent rates of change acrossdifferent durations (kinematics), such as measurements of actual change,velocity, and acceleration. An example of time-dependent attributesinclude the number of non-derogatory accounts that are provided withinthe past 3 months, 6 months, 12 months, 24 months, and 60 months. Fromthese attributes, step 508 can compute, for example, the velocity ofnon-derogatory accounts from 5 years to 2 years ago as the differencebetween the counts at those time periods divided by the 3-year duration.

Step 510 identified missing values in the acquired data, and derivedsubstitute values for many of the missing values via imputation rules.Missing values were pervasive across the credit risk and public recordsattributes. In an example of missingness procedure 510, the procedurewas designed to avoid adverse impact to any individual score withoutknowledge of an observed value. In other words, the resulting modelscore should not be beneficial or detrimental to a given applicant withrespect to similar applicants, if an unobserved value is passed as amodel input.

In view of these objectives, missingness procedure 510 systematicallyimputed the median, mode, or a default value conditioned on anapplicant's cohort as observed in the training data. A median wasimputed if the variable was continuous, a modal value if categorical,and a default setting if specified in provided data dictionaries. Duringmodel training, the procedure sampled with replacement from observedvalues either with a median or modal value depending on variable type.Additionally, for continuous variables, the procedure avoided samplingfrom the 10% extreme of the tails to avoid over-representing outlyingvalues.

Given an understanding of the acquired data set and of missingnessassociated with each source, model training of the underwriting models154, 240 employed univariate analysis 128 of the statistical associationbetween individual attributes and the target outcomes of interest. FIGS.6A-6D show several examples of this univariate analysis. It should beunderstood that the charts of FIGS. 6A-6D are disclosed only asillustrative examples of using observations in historical data as abasis for building multivariate models.

FIG. 6A and FIG. 6B display the relationship between an individual'snumber of collections (medical and utility excluded) and input addresslength of residence with survival, as measured by the actual-to-expected(A/E) ratio. The A/E ratio compares the actual number of deaths with theexpected number of deaths conditioned on the age, sex, duration ofobservation, and smoking status makeup of the underlying individuals. Alow A/E signifies a low-risk set of individuals, while an A/E of 100%indicates that the expected number of individuals have died.Underwriting protocols aim to stratify a population, here life insuranceapplicants, to produce pools of risk with target A/E values that drivepremiums. In FIG. 6A, number of collections (excluding medical andutility) have a monotonically increasing effect on mortality with 0collections being associated with an A/E of 65.8% and 3 or morecollections with an A/E of 110.9%. In FIG. 6B, input address length ofresidence has a negative relationship with mortality. A length ofresidence less than or equal to 2 years is associated with an A/E of80.9% and a length of residence of more than 30 years has an A/E of70.4%.

FIG. 6C and FIG. 6D show how an individual's number of derogatory publicrecords, which include felonies, liens, bankruptcies, and evictions, andthe individual's total number of accounts, are associated with smokingincidence. In FIG. 6C, an individual's total number of accounts has anegative relationship with smoking, with no accounts being associatedwith smoking incidence of 19.4%, 3.4 times higher than having 3 or moreaccounts on file, 5.7%. In FIG. 6D, number of derogatory records has apositive relationship with smoking, with one or more derogatory recordsbeing associated with 80% higher smoking incidence than having noderogatory records, 9.7% versus 5.3%. In view of significant univariaterelationships between these newly acquired data attributes and targetoutcomes of the present fluidless protocols, an embodiment of thepresent disclosure builds multivariate models with protective value formortality and smoking risk.

Initial data acquisition and pre-processing procedures (e.g., dataappend, missing value imputation, feature generation and exclusion)provide complete data with a sizable, though reduced, set of potentialpredictors for model construction. The modeling framework is designed toselect model inputs from these predictors based on their joint abilityto optimize a given objective function of each model in the fluidlessmodel suite 240. This objective is model-specific, e.g., to minimizepredicted mortality error or to maximize predicted likelihood ofsmoking. The Random Forest Modeling module 130 serves as a general,reusable framework that yields relatively parsimonious models whileoptimizing the specific objective of each model.

Algorithm 1 presents a backward feature selection process that balancesheld-out performance with rapid, but principled, model development.Beginning with the superset of variables and inclusion of random noisevectors, the algorithm trains a series of models using k-foldcross-validation to generate held-out predictions to compute a modelscore and an averaged variable importance ranking across all variables.All variables that fall below the importance ranking of random noise arethen dropped to produce the set of model covariates for the subsequentiteration. The process is repeated until the covariate list converges,and a final model is trained without the inclusion of the randomvariables. Corresponding pseudocode of an iterative procedure forselecting a locally minimal set of variables that yields ahigh-performing model is described as follows:

Algorithm 1: BackwardModelSelection(D,k,HS)

-   -   1 D←D plus random vectors R=R₁ . . . R_(j)    -   2 iter←1    -   3 V₀←0    -   4 V₁←attrs (D)    -   5 while V_(iter)≠_(iter-1) do    -   6 model scores S←Ø    -   7 variable importance list VI←Ø    -   8 for hyperparameter setting hs∈HS do    -   9 predictions P←Ø    -   10 for fold f in 1 to k do    -   11 train model M_(hs) on D_(1:k{f}) with covariate set V_(iter)    -   12 P←P∪predict (M_(hs), D_(f))    -   13 VI_(f,hs)←importance (M_(hs))    -   14 S_(hs)←evaluate (P,D)    -   15 M←argmin_(hs)S    -   16 dropped variables DV←V>R in avg(VI_(hs))    -   17 iter←iter+1    -   18 V_(iter)←V_(iter-1)\DV    -   19 return M

In various embodiments, this procedure is applied to construction ofboth the Fluidless Mortality models 242, 412 and the Smoking Propensitymodels 244, 414.

In various embodiments, models of the fluidless model suite 240 comprisemachine learning models that are trained on various sets of trainingdata. Suitable machine learning model classes include but are notlimited to random forests, logistic regression methods, support vectormachines, gradient tree boosting methods, nearest neighbor methods, andBayesian regression methods.

Models of fluidless model suite 240 use one or more models 132, 134, 136within the Random Forests ensemble 130 for Survival, Regression, andClassification (RF-SRC). Random Forest Modeling module 130 serves as ageneral, reusable framework that yields relatively parsimonious modelswhile optimizing the specific objective of each model. In Random Forestsmethods, ensemble learning is improved by injecting randomization intothe base learning process. RF-SRC extends Random Forests methods andprovides a unified treatment of the methodology for models includingright censored survival (single and multiple event competing risk),multivariate regression or classification, and mixed outcome (more thanone continuous, discrete, and/or categorical outcome). When onecontinuous or categorical outcome is present, the model reduces tounivariate regression or classification respectively.

Random forests models for classification (model 136) work by fitting anensemble of decision tree classifiers on sub samples of the data. Eachtree only sees a portion of the data, drawing samples of equal size withreplacement. Each tree can use only a limited number of features. Byaveraging the output of classification across the ensemble, the randomforests model can limit over-fitting that might otherwise occur in adecision tree model.

In an example, model training used 10-fold cross validation and agrid-search of relevant hyperparameters (number of trees, minimum sizeof terminal nodes) for random forests.

In various embodiments, the predictive machine learning models identifyfeatures that have the most pronounced impact on predicted value.Different types of fluidless underwriting model may identify differentfeatures as most important. For example, a model based upon a mortalityrisk signal may identify different leading features than a model basedupon a tobacco propensity signal. In various embodiments, leading modelfeatures were extracted from sources such as Public Records data 434,Credit Risk data 438, and CMI data 458.

In an example, the predictive value of model features was measured usingthe minimal depth of a maximal subtree (7), i.e., shortest distance fromthe root node to the parent node of the maximal subtree, as a variableimportance metric. The importance metric used conventionally for randomforests is permutation-based variable importance, a time-consumingprocedure. As applied within the global structure of the random forestsensemble, the minimal depth of a maximal subtree is a more efficientimportance metric, which is faithful to the global structure of themodel and is independent of model task and the values calculated atterminal nodes.

Variable importance metrics for random forests can exhibit biases withrespect to the number of chosen splits for features with differentdistributions and cardinalities. Modeling injected random noisevariables to compensate for this effect using a computationallyefficient procedure. Injected random noise variables corresponded toseveral main categories of distributions observed in the data set: (1)normal for continuous values; (2) binary with a proportion set to themean proportion across all binary variables; and (3) two negativebinomial variables for count-based features that exhibit small and largedispersion.

FIG. 4 displays a simplified schematic of a system for evaluatingprogram-eligible applicants for approval to receive a fluidlessunderwriting offer. System 400 requests inputs 420 from variousthird-party APIs 430 and 440 and receives fluidless digital applicationsof the sponsoring enterprise. System 400 tests inputs 410 across a setof fluidless models 410 of the enterprise in order to determine whetherto present an accelerated underwriting offer to the applicant. Models410 include a comprehensive algorithmic rule system 418 and threeprobabilistic models 412, 414, and 416. In various embodiments, in orderto receive approval for presentation of an accelerated underwritingoffer, the application must pass all model components 412, 414, 416, and418. Models 410 can be implemented using a single-processor systemincluding one processor, or a multi-processor system including anynumber of suitable processors that may be employed to provide forparallel and/or sequential execution of one or more of the modelcomponents 412, 414, 416, and 418.

In various embodiments, for one or more of probabilistic modelcomponents 412, 414, and 416, the fluidless underwriting protocoldetermines a quantitative risk score for the fluidless application anddetermines whether the respective risk score exceeds a set eligibilitythreshold for the respective model. For each of these model componentsthe system 300 incorporates eligibility thresholds established using athreshold-setting procedure. These eligibility thresholds can beimportant tools for actuarial analysis, i.e., for determining anobservable correlation between policyholder characteristics and cost tothe sponsoring enterprise. In an embodiment, the threshold-settingprocedure determines a certain percentage of the business of a givenrisk class assignment to be eligible. The threshold-setting procedurecan set cohort-specific thresholds by decreasing volume incrementallyuntil a target mortality impact is reached. The threshold-settingprocedure can set different thresholds for the various component models.The threshold-setting procedure can set thresholds within a pre-setrange of minimum and maximum risk scores.

Fluidless Algorithmic Rule System 418 stores and applies rules thatreflect a comprehensive set of medical and underwriting guidelinesdeveloped by experts in underwriting and insurance medicine, but thatexclude rules based on clinical laboratory data. In the presentdisclosure, rules applied by the Fluidless Algorithmic Rule System 418are sometimes called “non-clinical rules.” In an example, module 418stores about 4000 non-clinical rules. Each rule determines the bestavailable risk class in the presence of certain values in theapplication. For example, a high BMI would preclude an applicant fromreceiving a preferred-risk offer. Fluidless Algorithmic Rule System 418executes all non-clinical rules across data retrieved from variousClinical Guidelines databases 440. Fluidless Algorithmic Rule System 418executes all non-clinical rules across data retrieved from PrescriptionDrug (Rx) database 442, Medical Information Bureau (MIB) database 444and motor vehicle records (MVR) database 446.

The Prescription Drugs (Rx) input 442 determines whether the applicationremains eligible in view of publically available information aboutprescription fills. Additional eligibility criteria are checked viaapplying the automated rules to other inputs retrieved from MIB database444 and MVR database 446. Fluidless Algorithmic Rule System 418 can“clear” an algorithmic rule if it identifies adequate cause to overrideinformation flagged by the system. If one or more rules of module 418remain “red,” the system can automatically notify the advisor assignedto the applicant to order a lab test and paramedical examination.

In building the predictive models of the present disclosure, modeldatasets may have populations in the hundreds of thousands or millionsof individuals. In an example Fluidless Mortality Model 412 was builtfrom historical applications of the sponsoring enterprise containing 1.3MM records. Data preprocessing retained applicants with no missing CMI,BMI, or public records, yielding a training set of around 230,000historical applications 226. In an example, the Rx data set 442constructed during model building contained data for more than 120,000applicants, including more than three million prescription fill records.

Fluidless Mortality Model 412 predicts mortality risk of a givenindividual relative to the individual's age and sex cohort without useof clinical data. Fluidless Mortality Model 412 seeks to identifyfluidless applicants that pose the lowest mortality risk, to acceleratetheir experience with a simplified underwriting process. In processing afluidless application, if Fluidless Mortality Model 412 determines amortality risk score above a predetermined level, then the systemautomatically notifies the advisor assigned to the applicant to order alab test and paramedical examination.

Traditional methods of underwriting for mortality employ survivalmodeling for predicting ground-truth mortality. Survival modeling seeksto approximate the survival function, which describes the probabilitythat an event, occurring at a random variable time, occurs later thansome given time. In lieu of employing survival modeling as isconventional for predicting ground-truth mortality, Fluidless MortalityModel 412 can use a regression framework 134 to predict relativemortality. Regression framework 134 was seen to avoid temporalinconsistencies with public records and credit risk attributes thatwould occur in using the full range of exposure in survival modeling,and enabled more efficient model development in dealing with hundreds ofpredictors and iterative feature selection.

Fluidless Mortality Model 412 was trained using a regression frameworkwith historical underwriting risk classes, using assigned risk classesfrom a retrospective study to generate mortality assumptions in theapplications in training data. In order to incorporate nontraditionaldata sources (public records and credit risk) to measure mortality risk,an initial stage of the modeling pipeline was to ensure that anyattribute used as a potential covariate in the model is actuariallyjustified. Using the data preprocessing steps 500 of FIG. 5 , themodeling pipeline fitted a survival model 132 on solely the publicrecords and credit risk attributes 430. The features that showed nopredictive signal directly with observed deaths were excluded fromsubsequent steps of the modeling pipeline using regression.

To assess the performance of each regression model, the FluidlessMortality Model 412 incorporates a mortality-impact metric that isdesigned to weigh actual low-risk applicants and predict low-riskapplicants more heavily. In various embodiments, the fluidlessunderwriting protocol is primarily concerned about individuals whoreceive a low-risk score from Fluidless Mortality Model 412 and arelikely to be accelerated underwriting-eligible, or who truly have lowrelative mortality. To account for this, the mortality-impact metriccomputes a weight, for each individual i such that a low prediction ortrue label is associated with a higher weight, causing the errorassociated with these individuals to have a larger penalty:

$w_{i} = \frac{1}{\min\left( {{\mathcal{y}}_{i},{\mathcal{y}}_{i}} \right)}$

This weight is multiplied by the difference between the prediction andthe label, and the total error is taken as the square root of the meanof these weighted differences. In the present disclosure, the resultingmortality-impact metric is denoted as the mortality-impact weighted rootmean-squared error (WRMSE):

$e_{w} = \sqrt{\frac{\sum_{i}{\left( {{\mathcal{y}}_{i} - {\mathcal{y}}_{i}} \right)*w_{i}}}{N}}$

Smoking Propensity Model 414 addresses a challenge of the fluidlessunderwriting protocol, that actual knowledge of an individual's tobaccousage is a central factor in assessing mortality risk. Clinicallaboratory tests detect nicotine metabolites in fluid samples, but thisindicator of tobacco usage is missing in the fluidless underwritingprocess. Rather than rely solely on self-reporting of tobacco usage, thefluidless modeling suite includes a Smoking Propensity Model 414 thatspecifically predicts tobacco usage.

Based on tobacco usage of the insured that is self-reported in thedigital fluidless application, the accelerated underwriting offerincludes the two risk classes UPNT and SPT, which are the bestnon-tobacco and tobacco fluidless risk classes respectively. If notobacco usage were disclosed in the fluidless application, theconventional risk assignment would be the standard non-tobacco riskclass UPNT, while if tobacco usage were disclosed in the fluidlessapplication, the conventional risk assignment would be the standardtobacco risk class (SPT). In an example, the Smoking Propensity Model414 identifies tobacco usage by a significant portion of the unreportedtobacco users that otherwise would be assigned a UPNT status, and deniesthese individuals accelerated underwriting. Performance testing hasconfirmed that the Smoking Propensity Model 414 significantly reducesadverse impacts on mortality risk of the fluidless underwriting programdue to unreported tobacco usage.

Smoking Propensity Model 414 can define smoking as a binary outcome. Thetraining data set assigns the status of smoking to individuals whoeither had a positive nicotine test from urine or saliva or were offereda tobacco risk class. In the training data, males have a smoking statusa higher rate than females across all ages. Younger males have a smokingstatus at a much higher rate than older males. The rate of smokingstatus has significantly decreased over time during the time period ofthe historical applications. Based on testing of various modelingmethods, random forests performed comparably to, or better than, theother methods. This model class was selected for the Smoking PropensityModel 244, 414 for consistency across the fluidless model suite 240. Thesmoking propensity model is a classifier (using RF-C model 136) thatestimates the propensity for an individual to be a smoker, i.e., for thesmoking status of a particular individual to be TRUE given a set ofpredictors.

In various embodiments, the Smoking Propensity Model was initializedwith the same set of attributes as the Fluidless Mortality Model, andwas trained in two different model versions. A first smoking propensitymodel was trained with both credit risk and public records, while asecond smoking propensity model was trained with only public records. Aversion of the smoking propensity model adopted as Smoking PropensityModel can be trained with both credit risk and public records. Featureselection resulted in 59 attributes. A lift curve for the SmokingPropensity Model 414 indicates that this model identifies approximately2.5 times more smokers in the first decile than the overall baselinesmoking rate.

The Rx Model 416 (Prescription Fills model 246) predicts the probabilityof declining an accelerated underwriting offer or issuing a substandardoffer conditioned on information derived from prescription drug fills.Rx records in training data were pulled from a third-party pharmacydatabase vendor. The Rx data included prescribed drugs and dosages,dates filled and re-filled, therapeutic class, and name and specialty ofthe prescribing doctor. In addition, Rx data included a priorityassociated with each drug based on an analysis of each individual'sprescription drug history. Priority is indicated by color labeling ofred, yellow and green, with red signaling the greatest risk.

Given a set of Rx fills for each individual, model training can generateaggregate features to characterize the full prescription drug history.These features include the overall variety and total fills of drugs (byred, yellow, and green priority), variety and total fills of recenthigh-risk (red) drugs, variety and total fills of opioid-related drugs,and number of physicians with specialties related to severe diseases. Inaddition, model training selected the top 50 generic drugs bycalculating the proportion of substandards and declines associated witheach drug, ranking their importance after adjusting for the jointcredibility of all drugs and their interactions using a Markov chainMonte Carlo approximation. The most important drugs are generallyprescribed for diabetes, heart disease, mental health, and other seriousconditions. The final set of predictors also included age, gender andBMI.

Random forest classifier 136 was selected for Rx Model 416. In otherembodiments, Rx Model 416 incorporates a classifier based on a modelclass other than random forests. If Rx Model 416 indicates an applicanthas high risk, then the system automatically notifies the advisorassigned to the applicant to order a lab test and paramedicalexamination.

In an example of a method 700 for fluidless underwriting as shown inFIG. 7 , at step 702 the fluidless underwriting system receives afluidless digital application, e.g., from an applicant device. At aninitial stage of processing, the analytical engine 114 screens theapplication for program eligibility based on a set of programparameters. Program eligibility parameters include (a) an age of theinsured between 17 and 59, inclusive; (b) BMI between 18-31; (c) amaximum face amount of the insurance policy; (d) the requested lifeinsurance product is included on a list of available products.

In an optional embodiment of step 702, if initial screening determinesthat an applicant meets the program parameters, to proceed withfluidless underwriting an advisor representing the sponsoring enterprisemust also select the option for fluidless consideration duringapplication submission. Advisors experienced with the criteria foreligibility for accelerated underwriting (also herein called“accelerated underwriting program”), can review the submitted digitalapplication and may identify applicants that are unlikely to qualify forfluidless underwriting (e.g., if the applicant is diabetic oroverweight) so as not to delay ordering clinical data.

If the application meets initial eligibility requirements, the methodthen retrieves 704 public records and clinical guidelines for theidentified applicant. The application must pass a set of risk-relatedcriteria to receive a fluidless offer. At step 706, the methoddetermines eligibility of the fluidless application under FluidlessAlgorithmic Rules. If the fluidless application is eligible under theFluidless Algorithmic Rules, the process proceeds to step 710. If thefluidless application is not eligible under the Fluidless AlgorithmRules, e.g., because one or more critical algorithmic rules result in a“red” determination, the process declines 708 the acceleratedunderwriting offer. In an example, Fluidless Algorithm Rules result in a“red” determination if any major medical risk or non-medical risk(lifestyle risk) appears on the fluidless application. In variousembodiments, the process declines the accelerated underwriting offer byautomatically notifying the user (applicant) via user device 180 that itis necessary to obtain a lab test and paramedical examination and/or byautomatically notifying an advisor assigned to the application to orderthe lab test and paramedical examination. In various embodiments, theprocess generates and displays an explanation for display to the userwhen notifying the user of the declined application.

At step 710, the method determines eligibility of the fluidlessapplication under the Fluidless Mortality Module. If the fluidlessapplication is found eligible by the Fluidless Mortality Module, theprocess proceeds to step 714. If the fluidless application is not foundeligible by the Fluidless Mortality Module, e.g., because the applicantis determined to have an unacceptable mortality risk score, the processdeclines 712 the accelerated underwriting offer, e.g., by automaticallynotifying the user (applicant) that it is necessary to obtain a lab testand/or by automatically notifying an advisor assigned to the applicationto order the lab test and paramedical examination.

At step 710, the method determines eligibility of the fluidlessapplication under the Fluidless Mortality Model by determining amortality risk rank. As used in the present disclosure, a mortality rankcan include a raw mortality score. In another embodiment, a mortalityrisk rank incudes a tier or group corresponding to a given mortalityscore, wherein the tier or group is selected from a “high risk” and “lowrisk” tiers or groups that are based upon a distribution of mortalityrisk scores for a population of new business applicants of theenterprise. A mortality risk rank can include a percentileclassification of a given mortality risk score relative to all mortalityrisk scores for a population of customers of the enterprise. A mortalityrisk rank can include a combination of the above types of rank.Similarly, in the present disclosure other risk ranks such as “secondrisk rank” and “third risk rank” may include one or more of theseembodiments.

At step 714, the method determines eligibility of the fluidlessapplication under the Second User Attribute Module. The Second User RiskAttribute model predicts likelihood of at least one risk factor thatnormally can be indicated by clinical data when included in anapplication. The Second User Attribute Model can be a Smoking PropensityModule. The Smoking Propensity Module predicts whether the applicant isa smoker or non-smoker, which normally can be indicated in clinical dataincluded in typical medical examinations that screen for nicotine andcotinine in the urinalysis. If the fluidless application is foundeligible by the Second User Attribute Module, the process proceeds tostep 718. If the fluidless application is not found eligible by theSecond User Attribute Module, e.g., because the applicant receives asmoking binary classification by the Smoking Propensity Module, theprocess declines 716 the accelerated underwriting offer, e.g., byautomatically notifying the user (applicant) that it is necessary toobtain a lab test and/or by automatically notifying an advisor assignedto the application to order the lab test and paramedical examination.

At step 718, the method determines eligibility of the fluidlessapplication under the Third User Attribute Module. The Third User RiskAttribute Module predicts likelihood of at least one additional riskfactor that normally can be indicated by clinical data when included inan application, wherein the additional risk factor is different from therisk factor predicted by the Second User Risk Attribute Module. TheThird User Attribute Model can be a Prescription Fills (Rx) Module.Prescription Fills (Rx) Module can predict whether the applicant has oris at risk from various diseases and conditions that normally can beindicated in clinical data. If the fluidless application is foundeligible by the Third User Attribute Module, the process proceeds tostep 722. If the fluidless application is not found eligible by theThird User Attribute Module, e.g., because the applicant receives asubstandard or decline classification by the Prescription Fills (Rx)Module, the process declines 720 the accelerated underwriting offer,e.g., by automatically notifying the user (applicant) that it isnecessary to obtain a lab test and/or by automatically notifying anadvisor assigned to the application to order the lab test andparamedical examination.

In various embodiments, in the event of any of decline steps 708, 712,718, and 720, the method generates and displays to the user anexplanation file including the adverse model outcome. The explanationfile may include a modular explanation 278 of the particular fluidlessmodel component that generated the decline decision (failure to meeteligibility requirements for the relevant model component). Theexplanation file may include a holistic explanation 274 of the fluidlesspredictive model, e.g., required eligibility determinations by threecomponent models; and may include an explanation of the decision todecline expedited underwriting.

In determining eligibility of the fluidless application against multiplerisk attributes, in general the order of eligibility determinations isnot critical. The operations can be performed in parallel orconcurrently, and the order of the operations may be re-arranged. In themethod 700, the steps of determining eligibility under FluidlessAlgorithmic Rules 706, determining eligibility under Fluidless MortalityModule 710, determining eligibility under Second User Attribute Module714, and determining eligibility under Third User Attribute Module 718may be carried out in any order or concurrently.

At step 722, having passed all criteria 706, 710, 714, and 718, themethod automatically presents an accelerated underwriting offer in thefluidless digital application. In various embodiments, step 722automatically communicates the accelerated underwriting offer to theuser (applicant) via user device 180 and/or automatically notifies anadvisor assigned to the application to communicate the acceleratedunderwriting offer to the applicant.

In various embodiments, the system and method of the present disclosureuse additive feature attribution techniques to approximate the outputsof survival mortality models in “interpretable” outputs. Additivefeature attribution is a technique used to approximate the output of acomplicated multivariate function as a linear one on a binary featurespace. By building a model in this binary space, the linear coefficientsrepresent the effect of the “presence” or “absence” of a feature on themodel's output. For instance, in natural language processingapplications, this binary feature is often defined to be the presence orabsence of a word.

In the general case, given a data instance x={x₁, x₂, . . . , x_(d)} anda model f this technique employs an interpretable function h_(x):{0,1}^(d)→

^(d) that maps a binary vector to the original feature space via theformula h_(x)({1, 1, . . . 1})=x. A common choice for h_(x) is to definethis invertible function so that h_(x) ⁻¹(x′)_(i)=1 when x_(i)′=x_(i)and h_(x) ⁻¹(x′)_(i)=0 otherwise. In this way, given a new data instancex′, the ith “interpretable feature” z_(i)′=h_(x) ⁻¹(x′)_(i) can beconsidered the Boolean answer to the question “Is x_(i)′ equal tox_(i)”?

An additive explanation model g: {0,1}^(d)→

,

${g(z)} = {\varnothing_{0} + {\sum_{i = 1}^{d}{\varnothing_{i}z_{i}}}}$attempts to approximate g(h_(x) ⁻¹(x′))≈f (x) when x′≈x. As a result, aneffect ø_(i) is attributed to each feature x_(i), and summing theeffects of all feature attributions approximates the output on x:

${{f(x)} \approx {g\left( {h_{x}^{- 1}(x)} \right)}} = {{g\left( \left\{ {1,1,{\ldots 1}} \right\} \right)} = {\varnothing_{0} + {\sum_{i = 1}^{d}\varnothing_{1}}}}$

The system and method of the present disclosure can employ the additivefeature attribution methodology of LIME (Local InterpretableModel-agnostic Explanations), an explanation technique that explains thepredictions of any classifier, of Marco Tulio Ribeiro, Sameer Singh, andCarlos Guestrin, “Why Should I Trust You?: Explaining the Predictions ofAny Classifier,” KDD '16 Proceedings of the 22nd ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining Pages1135-1144, Aug. 13-17, 2016.

In various embodiments, the system and method of the present disclosureemploy SHAP values, a feature attribution method that draws on gametheory. Scott M. Lundberg, Su-In Lee, “A unified approach tointerpreting model predictions,” NIPS′17 Proceedings of the 31stInternational Conference on Neural Information Processing Systems, Pages4768-4777; Dec. 4-9, 2017. SHAP values (SHapley Additive exPlanation) isbased on Shapley values, a technique used in game theory to determinehow much each player in a collaborative game has contributed to itssuccess. Given a cooperative game in which different sets S of playerscan collaborate and produce certain outcomes V(S), Shapley valuesprovide a method to divide the value generated by the coalition amongthe different players. The SHAP value framework extends this solution tothe problem of dividing the model outcome f(x) between differentinterpretable features.

In SHAP values, the marginal contribution of a certain featurerepresents how much the “presence” of that feature changes the outcomeof the function, given the “presence” of certain other features. In anexample, S represented a subset of the features {1, 2, . . . d} andz_(S) represented the binary vector where z_(i)=1 for i∈S. The marginalcontribution Ø_(j) ^(S) for j∈S was defined as the difference in f whenthe jth feature in z is switched to a 1:Ø_(j) ^(S) =f(h _(x)(z_(S∪j)))−f(h _(x)(z_(S)))

The SHAP value for a certain feature was then defined as a weightedaverage of all of its marginal contributions:

$\varnothing_{i} = {\sum_{S}{\frac{{{❘S❘}!}{\left( {d - {❘S❘} - 1} \right)!}}{d!}\varnothing_{i}^{S}}}$

This weight represents the number of times the set S can appear as thefirst S features in the set of all possible orderings of the features.

As a method for generating intuitive explanations of survival mortalitymodel outputs, SHAP values feature attribution offers the advantage ofsatisfying three desirable properties for interpretability. First, thistechnique exactly approximates the function it is explaining: f(x)=g(h_(x) ⁻¹(x)). Second, if a variable x_(i) is missing or unknown,its contribution Ø_(i) is 0. Third, for two models f and f′, if Ø_(i)^(S)(f)≥Ø_(i) ^(S)(f′) for all subsets S, then Ø_(i)(f) is greater for fthan for f′.

In various embodiments, the choice of how to compute or estimate f(h_(x)(z_(S))) depends on what assumptions are made. This techniqueyields the expected model output on a data point when only the featuresin S are known:f(h _(x)(z _(S)))=E[f(x _(−S) |x _(S))]

The additive feature attribution technique assumes feature independenceand model linearity, as in the Kernel SHAP method described by Lundbergand Lee. This embodiment can approximate this expected value as f([x_(S), E[x_(−S)]) by swapping in expected values—or any otherreference value—for the unknown input features.

Lundberg and Lee show that the sums which define each SHAP value are thesolution to a constrained weighted linear regression problem on samplesz∈{0,1}^(d) and labels f (h_(x)(z)), and this characteristic allowscomputation time to be reduced significantly with a samplingapproximation.

In mapping from the interpretable space to the original data space,h_(x)(z) is defined so that for features i for which z_(i)=0, thecorresponding data point x_(i) is imputed to be a median value or modefor an age-sex cohort. In the present disclosure, these imputations aresometimes called “baseline” values. Baseline values are also used tohandle missing data in production versions of additive featureattribution for the predictive machine learning mortality models.

In various embodiments, groupings of variables are used to aggregate thesignal from a family of variables into a single “interpretable feature.”This feature aggregation can make the explanations more intuitive, andcan reduce the dimensionality of the interpretable feature space so thatfewer samples are needed for the regression step. Feature groups can behard-coded into model objects for the fluidless suite of underwritingmodels.

In an example, the machine learning predictive mortality modelingsystems methods of the present disclosure employed the R SHAP package inimplementations of SHAP values feature attribution. Correctness testsshowed that a sampling-based algorithm implemented by Lundberg and Leein the Python object-oriented programming language,https://www.python.org, correctly recovered SHAP values in the simplecase of linear regression predictive modeling.

In various embodiments, two algorithms for calculating SHAP values wereimplemented in Python: a sampling-based algorithm called “Kernel SHAP,”and a specialized algorithm called “Tree SHAP” that only works fortree-based models. The “Tree SHAP” algorithm can utilize structure fromthe model itself to better approximate E [f (x_(−S)|x_(S))] when featureindependence is violated. Scott M. Lundberg, Gabriel G. Erion, Su-InLee, “Consistent Individualized Feature Attribution for Tree Ensembles”;http://arxiv.org/abs/1802.03888 (2018). Comparing the results of the twoalgorithms on an XGBoost Cox survival trained with NHANES data providedwith the SHAP package showed that if the data is appropriatelysummarized in clusters used for the baseline input values, the L2 normof the difference in the explanations from the two methods tends toshrink.

In an example, a demo algorithm for calculating interpretability databased on SHAP values was implemented in R. Corresponding pseudocode isdescribed as follows.

   library(ISLR)  library(dplyr)  library(SHAP)  library(randomForest) dat ← Auto %>%  mutate(origin = as.factor(origin),  name =as.character(name)) %>%  group_by(name) %>%  sample_n(1) %>%  ungroup( )glimpse(dat) #> Observations: 301 #> Variables: 9 #> $ mpg <dbl> 13.0,15.0, 17.0, 24.3, 18.1, 20.2, 21.0, 18.0, 1... #> $ cylinders <dbl> 8,8, 8, 4, 6, 6, 6, 6, 6, 6, 8, 6, 6, 8, 4, 4, 4... #> $ displacement<dbl> 360, 390, 304, 151, 258, 232, 199, 199, 258, 232... #> $horsepower <dbl> 175, 190, 150, 90, 120, 90, 90, 97, 110, 100, 150... #>$ weight <dbl> 3821, 3850, 3672, 3003, 3410, 3265, 2648, 2774, 2... #> $acceleration <dbl> 11.0, 8.5, 11.5, 20.1, 15.1, 18.2, 15.0, 15.5, 13...#> $ year <dbl> 73, 70, 72, 80, 78, 79, 70, 70, 71, 71, 74, 75, 7... #>$ origin <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2... #> $name <chr> “amc ambassador brougham ”, “amc ambassador dpl”,...

The demo algorithm was designed to generate an explanation of a randomforest model for predicting mpg from other variables relating to anautomobile, other than the automobile name:

-   -   # Set train ids and train model    -   train_ids←sample(1:nrow(dat), size=floor(nrow(dat)*0.75))    -   train_dat←dat %>% slice(train_ids) %>% select(-name)    -   mdl←randomForest(mpg˜, data=train_dat)

The algorithm defines a single-argument function that returns modelpredictions on an input dataframe:

predict_fn←function(x) predict(mdl, x)

The last part of the algorithm for obtaining an explanation sets“baseline” values for each input variable. This was done by applying thefunction summarize_mean_or_mode to the training data, creating a datainstance in which each variable takes the mean value (if numeric) ormode. The model's prediction referred to this data as the “baseline”contribution:

  base ← summarize_mean_or_mode(train_dat %>% select(-mpg)) head(base)#> # A tibble: 1 x 7 #> cylinders   displacement   horsepower  weight acceleration  year   origin #> <dbl> <dbl>  <dbl>  <dbl>  <dbl> <dbl><fct> #> 15.52 196.  106.  3000.  15.5  76.2  1 predict_fn(base) #> 1 #>19.95633

The interpretability model was applied to a test case. By default, thecolumns of the baseline case were used as candidate features to whichcontributions are assigned. The contribution values add up to theprediction output in a new row:

  test_case ← dat %>% slice (-train_ids) %>% sample_n(1) head (testcase) #> # A tibble: 1 x 9 #>  mpg    cylindersdisplacement  horsepower  weight acceleration  year    origin #>  <dbl>   <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl> <fct> #> 1 25.5  4 140  89  2755   15.8   77   1 #>#  with 1 more variable: name <chr> predict_fn(test case) #>  1 #>24.15757 shap_explain (predict_fn, test_case, base) %>% round(3)#>  baseline  cylindersdisplacementhorsepower  weight #>  19.956  1.368  1.233    0.653   0.760 #>  acceleration    year  origin#>  −0.0570  0.245  0.000

In order to obtain explanations for a large number of rows, thealgorithm incorporated the function get_all_explanations:

test_dat ← dat %>% slice (-train_ids) %>% select (-mpg)get_all_explanations(predict_fn, head(test_dat), background_data = base)#> baselinecylindersdisplacement   horsepower    weight #> 119.956332.018393e−160.011713330.22220611   0.62843117 #> 2 19.95633−4.194062e−170.257278900.03605746  −0.08357722 #> 3 19.95633−5.291520e−01-0.21168287−1.91647893−2.85656662 #> 4 19.95633−4.210254e−01-0.20604466−1.83928309−1.46217009 #> 519.956331.212861e+000.97064398−0.01758733   1.87504649 #> 619.956332.097031e−170.24408 750−0.05172 722  −0.04993737 #> accelerationyear  origin #> 1 −3.405220e−17 −0.25341210.000000 #> 2 4.250448e−01−0.25545790.000000 #> 3 −4.399291e−17 −0.26054240.000000 #> 47.004933e−01 −0.13358910.000000 #> 5 8.289111e−010.2805527 −0.475709 #>6 1.635000e−024.17547270.000000

In the event the inputs do not include a baseline set of values,get_all_explanations will summarize a dataframe passed into thealgorithm:

get_all_explanations(predict_fn, head(test_dat), id = ′name′) #> namebaseline cylindersdisplacement horsepower #> 1 amc hornet 20.09987 0.1981247 −0.4915910  0.7400795 #> 2 amc hornet sportabout (sw)20.09987−0.22775880.16572270.4313756 #> 3 amc matador (sw) 20.09987 −0.6973803−0.3118325 −1.3014103 #> 4 amc rebel sst 20.09987 −0.4808245 −0.6582439−1.3659646 #> 5 bmw 320i 20.09987  1.1350284  0.7338807  0.4993306 #> 6buick century limited 20.09987  0.1116811 −0.4522107  0.6654143 #>weight acceleration   year    origin #> 1  1.2915819 −0.45426663−0.8025443 −0.015980223 #> 2  0.4208821 −0.21190377 −0.6799022 0.337395716 #> 3 −2.8478405 −0.37542917 −0.7677830  0.383718225 #> 4−0.7021966  0.56034122 −0.9264443  0.068179180 #> 5  2.0835170 0.74026148 −0.3831884 −0.277645095 #> 6  0.3303416 −0.03553958 3.5776157 −0.006591287

In some applications, it can be desirable to pass in different baselinecases for each row. For example, in an underwriting model that evaluatesmortality risk with respect to an age-sex cohort, the baseline should beage-sex cohort specific. In this event, passing in a function that takesin a row will return an appropriate baseline case.

Disclosed embodiments apply additive feature attribution tointerpretability of a mortality score and to outcomes of fluidlessunderwriting. In an example, the mortality score model outputs aquantitative score based on the scale from 0 to 100. The quantitativescore 100 represents the lowest risk (healthiest) and the quantitativescore 0 represents the most risky.

In order to facilitate analysis of the percentile risk score as anadditive quantity, the method and system of the disclosure can setrespective thresholds for different risk classes on this scale.

FIG. 8 illustrates an explanation 800 of a machine learning underwritingmodel prediction including a quantitative score, showing additivecontributions to the quantitative score. These additive contributionsquantify contributions of a set of fluidless underwriting features tothe quantitative score. The quantitative score is a risk score 810 basedon the scale from 0 to 100, where 100 represents the lowest risk(healthiest) and 0 represents the most risky. An additive featureattribution explanation model assigned each fluidless underwritingfeature an importance value for this particular prediction. The diagram800 represents additive contributions via vertical bars of respectivelengths corresponding to the size of each contribution, in which upwardbars have a positive value and downward bars have a negative value.Explanation 800 shows features 820, 830, and 840 that made positivecontributions to the risk score 810, and features 850, 860, 870, and 880that made negative contributions to the risk score 810. Summing 890 theeffects of all feature attributions approximates the quantitative scoreoutput 810 of the fluidless underwriting predictive model.

As depicted, the analytical engine server (e.g., via the explanationmodel) may generate a report to be displayed for the user (e.g., enduser and/or an administrator), such as the explanation 800. Theexplanation 800 may include an overall risk score 810, which iscalculated based on positive and negative contributions (e.g., additivecontributions). To further describe the risk score 810, the explanation800 also displays categories that have positively or negatively impactedrisk score 810 (e.g., features 820-880). Each feature may correspond toan additive contribution that is used to calculate the risk score 810.Thereby, viewing the features 820-880 allows the end user to understandand interpret the risk score 810.

Each feature may also include a graphical indicator corresponding to themagnitude of each respective feature and its impact on the score 810.For instance, the features depicted in FIG. 8 may have a correspondingvertical bar where the length of each bar represents the magnitude ofeach feature. In contrast, the direction of each vertical bar representswhether each feature has impacted the score 810 in a negative orpositive way. Each feature may also include a numerical score. Using thegraphical elements depicted in FIG. 8 , the end user can easilyinterpret important factors that have caused the analytical engineserver to calculate the risk score 810.

For instance, feature 830 graphically describes that the end user'sweight has positively impacted his/her risk score. The height of thevertical bar for the feature 830 illustrates that the end user's weighthas a more significant impact on the end user's score than his/her BMI.In contrast, feature 860 illustrates that the end user's bad debt hasnegatively impacted the user's risk score.

In some embodiments, the analytical engine server may allow the user tosimulate future scores by adjusting one or more additive contributionsdepicted in the explanation 800. For instance, the features 820-880 mayinclude an adjustable/interactive graphical element where the user canrevise the scores to identify how the overall score would be impacted.For instance, the end user may change the score for past due balance(i.e., feature 880) by interacting with the feature 880 (e.g.,interacting with the vertical bar and adjusting its height ordirection). The analytical engine server may then dynamically executevarious protocols described herein to recalculate the user's risk scoreand dynamically revise the explanation 800.

A model interpretability tool incorporated the interactive dashboard 900shown in the screen shot of FIG. 9 . This tool was used to explorecontribution values of test cases passed through the fluidless modelsuite. For a given policy number and a mortality risk model, the tooldisplayed the SHAP values for each group and other information about thecase's data.

The model interpretability tool 900 gives users a way to exploreconditional signals that are aggregated into the SHAP value. Once aninterpretable feature is selected, a user has the option to explore themarginal contributions Ø_(i) ^(S) that make up the SHAP value. Users canthereby explore how that group interacts with other variables. Using aform 950 “If these features had been taken from the baseline,” a usercan choose a subset of interpretable feature group S and see how thischoice affects the output f (h_(x)(z_(−s))). The formula specifies −S torepresent the interpretable features not taken from the baseline.

Using variable selection window 960, users can select an originalfeature i and optionally a subset S. Based on the selected features, thetool generates a plot 910 of value of Hemoglobin A1C 914 againstPercentile 918. Plot 910 shows how f (h_(x)(z_(−s))) changes as x_(i)changes. This function provides a way to define the marginalcontribution f(h_(x)(z_(−S∪i)))−f (h_(x)(z_(−s))). The plot 910 alsoshows how f (h_(x)(z₀)) changes as x_(i) changes, thereby depicting themarginal contribution of feature i on the baseline or “null” case.

The dashboard screenshot of FIG. 9 displays a case in which themortality model penalizes an individual for a value of Hemoglobin A1Cknown to be an unhealthy range. (Although this example displays a plotfor a clinical risk factor, the same procedures apply to display ofcontribution values of non-clinical risk factors). In plot 910, upperline 920 (“case”) shows how the individual's percentile changes withthis variable. The line 920 shows that had the value been lower andeverything else been the same, the person could have been below thethreshold level 940 required for the best risk class. Lower line 930(“baseline”) shows percentile changes of a baseline person with thisvariable.

Table 970 shows upper and lower thresholds and meanings (“low,”“normal,” “at risk,” and “diabetic”) assigned by the mortality model tovarious value ranges of the variable Hemoglobin A1C. A dashboard panel980 shows related information on mortality model contributions, e.g.,that 34 of the individual's 84 percentile points are attributed,overall, to the sugar values shown at the top.

Users such as model developers can employ the explanation tool 900 toidentify inconsistencies and undesirable behavior in machine learningunderwriting algorithms. In an example, the tool 900 helped usersvisualize that prototype fluidless machine learning underwritingalgorithms were applying unfavorable imputation rules to certain missingnon-clinical variables.

While various aspects and embodiments have been disclosed, other aspectsand embodiments are contemplated. The various aspects and embodimentsdisclosed are for purposes of illustration and are not intended to belimiting, with the true scope and spirit being indicated by thefollowing claims.

The foregoing method descriptions and the interface configuration areprovided merely as illustrative examples and are not intended torequire, or imply, that the steps of the various embodiments must beperformed in the order presented. As will be appreciated by one of skillin the art, the steps in the foregoing embodiments may be performed inany order. Words such as “then,” “next,” etc., are not intended to limitthe order of the steps; these words are simply used to guide the readerthrough the description of the methods. Although process flow diagramsmay describe the operations as a sequential process, many of theoperations can be performed in parallel or concurrently. In addition,the order of the operations may be re-arranged. A process may correspondto a method, a function, a procedure, a subroutine, a subprogram, etc.When a process corresponds to a function, its termination may correspondto a return of the function to the calling function or the mainfunction.

The various illustrative logical blocks, modules, circuits, andalgorithm steps described in connection with the embodiments disclosedhere may be implemented as electronic hardware, computer software, orcombinations of both. To clearly illustrate this interchangeability ofhardware and software, various illustrative components, blocks, modules,circuits, and steps have been described above generally in terms oftheir functionality. Whether such functionality is implemented ashardware or software depends upon the particular application and designconstraints imposed on the overall system. Skilled artisans mayimplement the described functionality in varying ways for eachparticular application, but such implementation decisions should not beinterpreted as causing a departure from the scope of the presentinvention.

Embodiments implemented in computer software may be implemented insoftware, firmware, middleware, microcode, hardware descriptionlanguages, or any combination thereof. A code segment ormachine-executable instructions may represent a procedure, a function, asubprogram, a program, a routine, a subroutine, a module, a softwarepackage, a class, or any combination of instructions, data structures,or program statements. A code segment may be coupled to another codesegment or a hardware circuit by passing and/or receiving information,data, arguments, parameters, or memory contents. Information, arguments,parameters, data, etc., may be passed, forwarded, or transmitted via anysuitable means including memory sharing, message passing, token passing,network transmission, etc.

The actual software code or specialized control hardware used toimplement these systems and methods is not limiting of the invention.Thus, the operation and behavior of the systems and methods weredescribed without reference to the specific software code, with it beingunderstood that software and control hardware can be designed toimplement the systems and methods based on the description here.

When implemented in software, the functions may be stored as one or moreinstructions or codes on a non-transitory computer-readable orprocessor-readable storage medium. The steps of a method or algorithmdisclosed here may be embodied in a processor-executable softwaremodule, which may reside on a computer-readable or processor-readablestorage medium. A non-transitory computer-readable or processor-readablemedia includes both computer storage media and tangible storage mediathat facilitate transfer of a computer program from one place toanother. A non-transitory processor-readable storage media may be anyavailable media that may be accessed by a computer. By way of example,and not limitation, such non-transitory processor-readable media maycomprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage,magnetic disk storage or other magnetic storage devices, or any othertangible storage medium that may be used to store desired program codein the form of instructions or data structures and that may be accessedby a computer or processor. Disk and disc, as used here, include compactdisc (CD), laser disc, optical disc, digital versatile disc (DVD),floppy disk, and Blu-ray disc where disks usually reproduce datamagnetically, while discs reproduce data optically with lasers.Combinations of the above should also be included within the scope ofcomputer-readable media. Additionally, the operations of a method oralgorithm may reside as one or any combination or set of codes and/orinstructions on a non-transitory processor-readable medium and/orcomputer-readable medium, which may be incorporated into a computerprogram product.

What is claimed is:
 1. A method for processing an electronicapplication, comprising: receiving, by a processor, a plurality ofvariables of an electronic application from a user device, wherein theplurality of variables for the electronic application exclude clinicaldata for an applicant; upon receiving the plurality of variables for theelectronic application from the user device, retrieving, by theprocessor, public data identified with the applicant of the electronicapplication from one or more third-party sources; executing, by theprocessor, a first predictive machine learning model by inputtingselected features from the plurality of variables for the electronicapplication and the public data identified with the applicant of theelectronic application to determine a first risk rank representative ofa mortality risk for the electronic application and to classify theelectronic application into one of a first high risk group and a firstlow risk group based upon the first risk rank, wherein the firstpredictive machine learning model is trained by inputting a plurality ofhistorical application records; executing, by the processor, a secondpredictive machine learning model to determine a second risk rank and toclassify the electronic application into one of a second high risk groupand a second low risk group based upon the second risk rank; executing,by the processor, a third predictive machine learning model to determinea third risk rank and to classify the electronic application into one ofa third high risk group and a third low risk group based upon the thirdrisk rank; and executing, by the processor, an explanation modelconfigured to generate an explanation file that includes additivecontributions of the features selected from the plurality of variablesfor the electronic application and the public data identified with theapplicant by inputting the selected features into an additive featureattribution model; and generating, by the processor, explanation dataderived from the explanation file for display on a user interface,wherein executing the explanation model generates, for display on theuser interface, an explanation dashboard to receive a selection input ofat least one of the selected features, wherein the explanation dashboarddisplays a graphical representation of marginal contributions of the atleast one of the selected features.
 2. The method of claim 1, whereinthe additive feature attribution model executes a SHAP values (SHapleyAdditive exPlanation) algorithm.
 3. The method of claim 1, wherein theadditive feature attribution model executes a Kernel SHAP (SHapleyAdditive exPlanation) algorithm.
 4. The method of claim 1, wherein theadditive feature attribution model executes a Tree SHAP (SHapleyAdditive exPlanation) algorithm.
 5. The method of claim 1, whereinduring training of the first predictive machine learning model eachhistorical application record is supplemented with public dataidentified with an applicant of the respective historical applicationrecord received from the one or more third-party sources.
 6. The methodof claim 5, wherein the public data identified with the applicant of theelectronic application, and the public data identified with therespective applicant of each historical application record, comprisepublic records and credit risk data.
 7. A method for processing anelectronic application, comprising: receiving, by a processor, aplurality of variables of an electronic application from a user device,wherein the plurality of variables for the electronic applicationexclude clinical data for an applicant; upon receiving the plurality ofvariables for the electronic application from the user device,retrieving, by the processor, public data identified with the applicantof the electronic application from one or more third-party sources;executing, by the processor, a first predictive machine learning modelby inputting selected features from the plurality of variables for theelectronic application and the public data identified with the applicantof the electronic application to determine a first risk rankrepresentative of a mortality risk for the electronic application and toclassify the electronic application into one of a first high risk groupand a first low risk group based upon the first risk rank, wherein thefirst predictive machine learning model is trained by inputtingengineered features and customer profile data; executing, by theprocessor, a second predictive machine learning model to determine asecond risk rank and to classify the electronic application into one ofa second high risk group and a second low risk group based upon thesecond risk rank; executing, by the processor, a third predictivemachine learning model to determine a third risk rank and to classifythe electronic application into one of a third high risk group and athird low risk group based upon the third risk rank; and executing, bythe processor, an explanation model configured to generate anexplanation file that includes additive contributions of the featuresselected from the plurality of variables for the electronic applicationand the public data identified with the applicant by inputting theselected features into an additive feature attribution model; andgenerating, by the processor, explanation data derived from theexplanation file for display on a user interface, wherein executing theexplanation model generates, for display on the user interface, anexplanation dashboard to receive a selection input of at least one ofthe selected features, wherein the first risk rank comprises apercentile risk score of the electronic application, wherein theexplanation dashboard displays a graphical representation of change ofthe percentile risk score with changed value of the at least one of theselected features.
 8. The method of claim 7, wherein when the processorclassifies the electronic application into one or more of the first highrisk group, the second high risk group, and the third high risk group,the explanation file further includes a holistic explanation of thefirst predictive machine learning model, the second predictive machinelearning model, and the third predictive machine learning model.
 9. Themethod of claim 7, wherein the explanation dashboard further displays agraphical representation of change of the percentile risk score withchanged value of a baseline case of the at least one of the selectedfeatures.
 10. The method of claim 7, wherein the second risk rank isrepresentative of propensity of the applicant of the electronicapplication to be a smoker.
 11. The method of claim 7, wherein the thirdpredictive machine learning model determines disqualifying medical risksbased on information derived from prescription drug fills for theapplicant of the electronic application.
 12. A system comprising: ananalytical engine a server containing a processor configured to executea plurality of non-transitory computer-readable instructions configuredto: receive a plurality of variables for an electronic application froma user device that excludes clinical data for an applicant of theelectronic application, and for retrieving public data identified withthe applicant of the received electronic application from one or morethird-party sources; execute a predictive machine learning module todetermine a mortality risk rank for the electronic application andclassify the electronic application into a first low risk group or afirst high risk group; execute a smoking propensity predictive model;wherein the smoking propensity model is configured to estimate apropensity of the applicant of the electronic application to be a smokerand determine a smoking/non-smoking binary target; execute aprescription drug data predictive model configured to determine adisqualifying medical risk based on information derived fromprescription drug fills for the applicant of the electronic application;execute an explanation model configured to generate an execution filethat includes additive contributions of one or more variables of theplurality of variables of the electronic application by inputtingfeatures representative of at least some of the plurality of variablesof the electronic application into an additive feature attributionmodel; and generate explanation data derived from the explanation filefor display on a user interface, wherein the explanation data comprisesan explanation dashboard that displays a graphical representation ofmarginal contributions of the features representative of at least someof the plurality of variables of the electronic application.
 13. Thesystem of claim 12, wherein the additive feature attribution modelexecutes a SHAP values (SHapley Additive exPlanation) algorithm.
 14. Thesystem of claim 12, wherein the additive feature attribution modelexecutes a Kernel SHAP (SHapley Additive exPlanation) algorithm.
 15. Thesystem of claim 12, wherein the additive feature attribution modelexecutes a Tree SHAP (SHapley Additive exPlanation) algorithm.